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1. MAIN DISCUSSION 

The concept of influence function of an estimator 
was originally coined in the theory of robust statis¬ 
tics, and as an asymptotic influence function played 
a role in the development of semiparametric statis¬ 
tics ([2, 3]). If an estimator r„ of a quantity p, based 
on a random sample of observations Xi , X 2 , • • •, X^ 
possesses an asymptotic expansion of the form 

n 

( 1 . 1 ) Tn = p +-'^f{Xi) + op{n~^/‘^), 

i=l 

then the function is its asymptotic influence func¬ 
tion. The name derives from the fact that if an obser¬ 
vation Xi is replaced by a value x, then the change 
in the estimator is n~^{'ip{x) — 'f{Xi)), at least if 
the remainder term op{n~^/^) is neglected. The es¬ 
timator is “asymptotically robust” if this change is 
bounded in x, that is, if the influence function if is 
bounded. 

Semiparametric theory as developed in the 1980s/ 
90s was not concerned with robustness, but with ef¬ 
ficient estimation. Provided that the variables ijj{Xi) 
have zero mean and finite variance, the expan¬ 
sion (1.1) implies that the sequence y/niTn — p) 
is asymptotically normally distributed with mean 
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zero. Among different asymptotically unbiased es¬ 
timators, the ones with small asymptotic variance 
are preferred. Semiparametric lower bound theory 
showed that under so-called “asymptotic regularity” 
estimators with an expansion (1.1) with the effi¬ 
cient influence function attain the smallest variance. 
Furthermore, it showed how to compute the latter 
function from the tangent space of the underlying 
semiparametric model ([1, 4, 7], and [17]). 

Higher order tangent spaces and influence func¬ 
tions are generalizations of these concepts, but were 
developed by Robins et al. [9] from the perspective 
of constructing estimators rather than asymptotic 
efficiency. Thus, it will be fruitful to also give the 
definitions of influence functions and tangent spaces 
from the point of view of constructing estimators. 

Assume that the observations Xi,..., Xn are a 
random sample from a distribution with density 
Prj relative to a measure p ona sample space {X,A). 
The parameter rj is known to belong to a subset PL 
of a normed space, and it is desired to estimate the 
value xiv) of ^ functional X-T~l —^ K- Interest is in 
the situation of a semiparametric or nonparamet- 
ric model, where PL is infinite-dimensional and the 
dependence pe^pr^ is assumed smooth (as in [16]). 

Given a “consistent” initial estimator p of p, the 
“plug-in estimator” xiv) is typically consistent for 
the parameter of interest xiv): bot it may not be a 
good estimator. In particular, if ?) is a general pur¬ 
pose estimator, not specially constructed to yield a 
good plug-in, then xio) will often have a suboptimal 
precision. To gain insight in this situation assume 
that the parameter permits a Taylor expansion of 
the form 

(1.2) xiv) = xiv) + X'giv -V) + Oi\\p - pf). 
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Such an expansion suggests that the plug-in esti¬ 
mator will have an error of the order Op{\\r] — fi\\), 
unless the linear term x'fiiv ~ v) the expansion 
vanishes and the error has the square of this order. 
For a large parameter set, the latter estimation error 
will typically be large. 

The expansion (1.2) also suggests that better es¬ 
timators can be obtained by “estimating” the lin¬ 
ear term. To achieve this assume a “generalized von 
Mises representation” of the derivative of the form 

x'rjiv -V)= f xld{Pr^ - Pfi) 

(1.3) 

= Prjxl + 0{\\r]-fif), 

for some measurable function : T” —)• M. Here Pf 
is short for the integral f fdP, and it is assumed 
that Prjxlj = 0 for every rj [which can always be ar¬ 
ranged by a recentering, as f ld(Pfj — Pf^) = 0]. The 
von Mises representation (1.3) and (1.2) suggest the 
“corrected plug-in estimator” 

(1-4) r„ = x(57)+Pn4> 

where P^/ = is the expectation 

of a function / under the empir¬ 
ical measure P^- It is reasonable to assume that 
(P„ — Pri)x}j is asymptotically equivalent to (P^ — 
Pri)xli up to the order op(n“^/^), as the difference 
(P„ — Pri)x\ is “centered” and ought to have “vari¬ 
ance” of the order 0(l/n). (We put “centered” and 
“variance” in quotes because the randomness in the 
initial estimator fj prevents a simple calculation of 
mean and variance.) Thus, under reasonable reg¬ 
ularity conditions the corrected plug-in estimator 

(1.4) will satisfy 

Tn - xiv) 

(1.5) = xiv) - xiv) + PvXfi + - Prj)Xf, 

= OiWv - Tjf) + (P„ - Pri)xl + op(n-P^). 

If the first term on the right is sufficiently small, 
specifically \\fi — ri\\ = op{n~^P), then T„ satisfies 
(1.1) with xlj as the influence function. 

The improvement of the estimator (1.4) over the 
ordinary plug-in estimator is that the estimation 
error \\fi — r]\\ need have order Op{n~^P) rather 
than Op{n~^P) for the estimator to have error 
Op{n~^P). For small “parametric” models this is 
not very relevant, but for semi- or nonparametric 
models the gain can be substantial. For instance, if 


fj involves an ordinary smoothing estimator of a re¬ 
gression function on a d-dimensional domain, then 
a typical rate of estimation is 77 ,-“/( 2 «+d)^ £qp q, 
number of derivatives of the true regression func¬ 
tion. This is never Op{n~^P), but Op{n~^P) for 
a > d/2. 

The function xi^ iu the von Mises representation 
(1.3) is exactly an “influence function” as in the the¬ 
ory of semiparametric models (see [2, 4, 7, 17]) and 
can be related to the “tangent set”. Informally, a 
tangent set (at Pr/} of a model [Prj :r] GP) is the set 
of all score functions at t = 0, 

of (smooth) one-dimensional submodels [Pr ^^: t >0) 
with Vo = V- [Here f i—)• is a map from a neigh¬ 
bourhood of 0 G M to such that the derivative 

(1.6) exists.] An influence function [of the real pa¬ 
rameter xiv) P^] is defined as a measurable map 
X !->■ xljix) such that, for all paths tt-^ rjt considered, 

(1.7) I,,.=-Pi*.. 

Combining (1.2)-(1.3) (with r]t in the role of g and 
T] in the role of fj), we see that xivt) is to the first or¬ 
der given by xiv) + Pr/tXli- Since, according to (1.6), 
Qn dPri is the derivative at f = 0 of dPr ^, we next con¬ 
clude that the function x^ in the von Mises expan¬ 
sion (1.3) is an influence function also in the sense 
of (1.7). 

An influence function is not necessarily unique, as 
only its inner products with elements grf of the tan¬ 
gent set matter. An influence function that is con¬ 
tained in the closed linear span of the tangent set is 
called the efficient influence funetion. It minimizes 
the variance var,^P„Xi^ over all influence functions 
and is the influence function of asymptotically effi¬ 
cient estimators. 

The theory developed by Robins et al. in [9] ex¬ 
tends the preceding from linear to higher order ap¬ 
proximations. The motivation is that the parame¬ 
ter rj may be so high dimensional that no estima¬ 
tor fj attains the rate Opin~^P). The preceding 
suggests that then the corrected plug-in estimator 
will be suboptimal, as in the expansion (1.5) the 
“bias” xiv) ~xiv) P PvXfj dominates the “variance” 
(P„ — Pr))x}j- For this situation Robins et al. [9] intro¬ 
duced higher order expansions and influence func¬ 
tions, as follows. 
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A tangent set of order m (at Pr^) are all deriva¬ 
tives of the type, for given one-dimensional submod¬ 
els {Prjt ; t > 0), 

9r){pli ■ ■ ■ 1 Tm) 


The functions on the right-hand side are higher or¬ 
der score functions ([6, 14]). These are defined rela¬ 
tive to the joint density (xi,..., Xm) S' nT=iPni Xi) 
of m observations, not as higher order derivatives of 
a single density, because higher order derivatives of 
the log likelihood of n observations do not reduce 
to sums over single observations, as do first order 
derivatives. The relationship between expansions on 
a single observation and the joint likelihood can be 
seen from 



n + ]jt^gr,{xi) H- ^ 


n 

= l+t^gn{xi) 
i=\ 


+ t^ 



n 

9ni^i)9r, 

i=l 



+ ■■■■ 


Inspection of this expansion shows that the coeffi¬ 
cient of P is a U-statistic of degree j [cf. equation 
(1.11) below]. The kernels of these t/-statistics up to 
order m can also be obtained as higher order deriva¬ 
tives of products of m densities, as in (1.8). Further¬ 
more, they are degenerate in the sense that the in¬ 
tegral of a kernel with respect to a single coordinate 
relative to the true density prj is zero, generalizing 
the property that a score function has mean zero; 
equivalently, this property can be described as or¬ 
thogonality of higher order score functions relative 
to lower order score functions. 

Correspondingly, an influence function of order m 
[of the map t] xiv) at Prj] is a measurable map 
(xi,... ,Xm) i-A Xr,(xi,... ,Xm) such that, for every 
given one-dimensional submodel {Pr^^ > 0), 

(1-9) ■^^^_x{.Prit) = Pi^Xr!9u^ j = l,2,...,m. 


This influence function is determined only up to its 
inner products with the tangent set and hence is not 
unique. A minimal version could be defined as one 
such that the variance of the tZ-statistic with kernel 
is minimal. 

For computation in examples the dehning equa¬ 
tions (1.9) of a higher order influence function can 
be tedious. It is usually easier to apply the rule 
that a higher order derivative is the derivative of 
the previous order derivative (as shown for second 
order influence functions in [8], 4.3.11). One com¬ 
putes the first order influence function xi i—>■ xl^ixi) 
of the functional rjt-^ xiv) a-s usual. Next one recur¬ 
sively for j = 2,3,..., m determines influence func¬ 
tions, written Xj eA Xvixi, ■ ■ ■, Xj) as influence func¬ 
tions of the functionals rj i—>■ x^~^(xi,... ,Xj_i), for 
hxed (xi,..., Xj_i). The function xf) can be made 
degenerate (in the sense defined previously) by sub¬ 
tracting its projection on the linear span of all func¬ 
tions of one argument less. Then 



is an mth order influence function. As we consider 
only a single value of m at a time, we do not let 
m show up in the notation on the left. As a conse¬ 
quence, the formulas in the following will appear as 
in the linear case. 

Given an influence function of order m, we may 
now generalize the definition of the improved plug-in 
estimator (1.4) to 

(1-10) r„ = x(h)+U„Xj5, 

for U„/ denoting a 17-statistic of order m with ker¬ 
nel /: 

Un/ = 7 I , \ 

n(n — l---{n — m-|-l) 

( 1 . 11 ) 

The term should correct the plug-in estimator 

xiv) up to order m and, hence, an argument similar 
to (1.5) should give the expansion 

Tn-x{v) = Oi\\f,-v\r+^) 

( 1 . 12 ) 

+ (U„-P-)x, + op(n-'/2)_ 

The bias of the plug-in estimator xiv) would be cor¬ 
rected to the order 0{\\r] — J?]]™^^), and good es¬ 
timators for xih) exist even in situations where rj 
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is estimable only with low precision. The only cost 
would be a slightly larger variance in the U -statistic 
relative to the empirical measure. 

Unfortunately, there is no such free lunch; one can¬ 
not seriously correct bias without seriously increas¬ 
ing the variance. Although (1.12) and the preceding 
heuristics are correct, they do not apply, as higher 
order influence functions typically do not exist. Be¬ 
sides by a lack of invertibility of the map ?? —)• Pjj, 
this is caused by failure of a higher order von Mises 
type representation. Whereas a continuous, linear 
map B:L 2 {Prj) —)• K, such as arises from the first 
derivative x'rj ™ is always representable as an 

inner product B{g) = Prixl^g for some function a 
continuous, multilinear map B : L 2 {Pr^y —)• R is not 
necessarily representable as a repeated integral of 
the type 

■ Xrjixi, ■ ■ . ,Xj)dPr,{xi) ■ ■ ■ dPrjiXj). 

The dehnition (1.10) uses such a “von Mises rep¬ 
resentation” in order to estimate the higher deriva¬ 
tives using the data, by a tZ-statistic. 

We must therefore set a more modest aim: cor¬ 
recting the bias in certain directions only. A key 
observation is that a multilinear map on a finite¬ 
dimensional subspace L x ■ ■ ■ x L C L 2 {P'q)^ is al¬ 
ways representable by a kernel. If the invertibility 
rj Pri can be resolved, we can therefore always 
“represent” and estimate the mth order derivative 
at differences g — fi within a given finite-dimensional 
linear space. The bias in nonrepresented directions 
then remains, and the challenge is to determine the 
directions that balance three terms: 

• the bias in the nonrepresented directions, repre¬ 
sentation bias, 

• the estimation error Op{\\fj — , the estima¬ 

tion bias, 

• the variance of the resulting U -statistic. 

Regarding the third component, we note that, al¬ 
though the variance of a t/-statistic with a fixed 
kernel is dominated by its linear term and is of or¬ 
der 0(l/re), the need to represent the functionals in 
more and more directions given larger sample size n 
results in kernels that become more and more com¬ 
plex with n. The resulting variance of is there¬ 
fore typically larger than 0(l/n). A new balance 


should be found with the squared biases, which will 
also be larger than parametric. 

The preceding heuristic scheme is general, but 
its implementation requires finding the appropri¬ 
ate influence functions that create the correct bias- 
variance trade-off. Robins et al. [9] achieved this 
for estimating a functional in a class of high¬ 
dimensional semiparametric models that includes 
some popular models for missing data or causal in¬ 
ference. The high dimensions arise by the inclusion 
of a multivariate “control covariate”. The models 
have a technical characterization, through a certain 
form of the first order influence function. They are 
structured semiparametric models in that their nat¬ 
ural parameterization is in terms of three or more 
parameters, which vary independently. Thus, the 
full parameter takes the form g = (a, b, c, /), that is 
partitioned in three subparameters a, b, c and /. 
The parameter / is the marginal density of an ob¬ 
servable covariate Z. The technical characterization 
is that the hrst order influence function of the pa¬ 
rameter of interest g i— xiv) can be written in the 
form 

xl{x) = a{z)b{z)Si{x) -\- a{z)S 2 ix) 

(1.13) 

-h b{z)S 3 {x) + Sfix) - xid), 

for known functions S'j(x) of the data [i.e., S = 
(£' 1 , 52 , 53 , 54 ) is a given statistic]. The covariate Z 
is assumed to range over a compact d-dimensional 
domain and the parameters a, b, f are unknown 
functions on this domain, restricted only nonpara- 
metrically by smoothness assumptions. The param¬ 
eter c is an additional parameter to complete the 
identification of the distribution of X, but it does 
not appear in (1.13). 

As the higher order corrections are based on von 
Mises representations of higher order influence func¬ 
tions, which are derivatives of the first order influ¬ 
ence function, it is not unnatural to base a theory on 
the form of the first order influence function. How¬ 
ever, by itself (1.13) appears not insightful. The fol¬ 
lowing examples illustrate the class of models. 

Example 1.1 (Missing data). In a version of 
the missing data problem we observe the triple 
X = (YA, A, Z), where Y and A are random vari¬ 
ables that take values in the two-point set {0,1} 
that are conditionally independent given the vari¬ 
able Z. We can think of T as a response, which is 
observed only if the indicator A takes the value 1. 


HIGHER ORDER TANGENT SPACES AND INFLUENGE FUNGTIONS 


5 


To ensure independence of the response and miss¬ 
ingness, the covariate Z would be chosen such that 
it contains all information on the dependence be¬ 
tween Y and A (“missing at random”). Alterna¬ 
tively, we can think of T as a counterfactual out¬ 
come if a treatment were given (A = 1 ) and esti¬ 
mate (half) the treatment effect under the assump¬ 
tion of “no unmeasured confounders”. Both applica¬ 
tions may require that Z is high dimensional (e.g., 
of dimension 10 ), where there is typically insuf¬ 
ficient a priori information to model the form of 
the dependence of A and Y on Z. The three pa¬ 
rameters are the marginal density f of Z and the 
(inverse) probabilities b{z) = P(y = IjZ = z) and 
a(z)~^ = P(A = 1\Z = z). The functional of interest 
is the mean response ET, that is. 



The representation (1.13) can be shown to be valid 
with 5 i = —A, S2 = Ay, Ss = 1 and 54 = 0 (see, 
e.g., [10]). The parameters a and b are (transformed) 
regression functions and are nonparametrically es¬ 
timable at the rates g^j^d if 

they are a priori known to be a- and /3-smooth, 
where d is the dimension of Z. The parameter / 
is a density and can be estimated from the covari¬ 
ates. Closer inspection [see (1.14) below] shows that 
a more crucial parameter is the quotient //a, which 
is proportional to the conditional density of Z given 
A = 1 and can be estimated directly from the ob¬ 
served covariates and treatment indicators, at a rate 
if fj^is function is known to be 7 -smooth. 
The purpose of constructing higher order influence 
functions is to ensure that standard nonparamet- 
ric regression or density estimators can replace the 
unknown parameters in theoretical expressions with 
optimal estimators as a result. 

Example 1.2 (Covariance model). Let a typi¬ 
cal observation be a triple X = (y, A, Z), where Y 
and A are binary variables with values in {0,1}. We 
are interested in estimating the expected conditional 
product moment E[E(yjZ)E(AjZ)]. In terms of the 
parameters a(Z) = E(AjZ) and b{Z) = E(yjZ), and 
r] = (a, 6, /, c), for / the marginal density of Z and c 
an additional parameter, this target can be written 
as 

Xiv) = j abfdu. 

Representation (1.13) can be seen to hold with iSi = 
— 1, 52 = A, S 3 = Y and 54 = 0. The parameters a 


and b are regression functions of Y and A on Z and 
hence can be estimated at the rates gnd 

fi-d/[ 2 i 3 +d) jf ^]^gy gpg g priort known to be a- and 
/3-smooth. The marginal density / can be similarly 
estimated nonparametrically from the observed co¬ 
variates. 

The triple (a, 6 , /) does not fully parameterize 
the joint distribution of an observation, but the re¬ 
maining part c of the parameter does not seem to 
play a role when estimating xiv)- full parame¬ 
terization is obtained by adding the treatment ef¬ 
fect function c(Z) = E(ylA = 1, Z) - E(yl A = 0, Z). 
The conditional distribution of Y given A can then 
be expressed in {a,b,c,f) through P(y = IjA, Z) = 
c(Z)(A-o(Z)) + 6 (Z). 

Estimating xid) is relevant to the biostatistical 
setup through a detour, which relates xiv) to the 
treatment effect function c. First, in terms of sta¬ 
tistical difficulty, the functional xiv) is equivalent 
to the functional Ecov(y, AjZ) = E(yA) — xi'n)^ us 
E(yA) can be estimated at the rate by a 

simple sample average. Second, the problem of es¬ 
timating Ecov(y, AjZ) is a template for estimating 
^(t) := Ecov(y — tA, AjZ), for every given t, which 
can next be inverted to give an estimate for the value 
T that satishes ipir) =0. The latter value can be 
shown to be equal to the variance weighted average 
treatment effect 

_ Evar(AlZ)c(Z) 

Evar(AjZ) 

(See [12], Section 4 for details.) Under the assump¬ 
tion of nonconfounding this parameter is nonzero if 
and only if the treatment A has a nonzero causal ef¬ 
fect, and it may be the ultimate purpose to ascertain 
this. 


Example 1.3 (Average treatment effect). Sup¬ 
pose a clinical trial with two possible treatments, in¬ 
dicated by A G {0,1}, has two binary outcome vari¬ 
ables Yi and y 2 , and let aj{Z) = E(l^jA = 1,Z) — 
E(l^jA = 0, Z) be the treatment effects at level Z 
of an observed covariate, for j = 1,2. We observe 
a random sample of the variables {Yi,Y 2 ,A,Z) and 
are interested in estimating the average treatment 
effect 



0102 f dv. 


Here ry parameterizes the distribution of (Y\^Y 2 ,A, Z), 
and / is the density of the covariate Z, relative to 
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some measure v, for instance, the Lebesgue mea¬ 
sure on a compact subset of The parameter r] 
includes the triplet (oi, 02 , /) and possibly other un¬ 
known aspects of the distribution of an observation. 
In a clinical trial the probability vr(Z) = P (74 = 1|Z) 
that an individual with covariate Z is treated will 
be a known function of the covariate. 

As the tangent space is a true subspace of the full 
tangent space, there are multiple influence functions 
for y. It can be shown that any influence function 
of X can be represented in the form (1.13) with, for 
some measurable function C, 


2A{A-tt{Z)) 

vr(Z)(l-7r)(Z)’ 


82 = Y2 
Sz = Yi 


A-7r{Z) 

7r(Z)(l-vr)(Z)’ 

A-7r{Z) 

vr(Z)(l-vr)(Z)’ 


A 4 = C{Z) 


A- 7 r(Z) 

7r(Z)(l-7r)(Z)- 


Perhaps the special case that Yf = I 2 is of most in¬ 
terest. The parameter ( 01 , 02 ,/) then reduces to a 
pair (o,/), and 82 = S 3 , but the general setup re¬ 
mains the same. 


In models with first order influence function of 
the form (1.13) the error of the first order von Mises 
representation (1.2)-(1.3) can be computed to be, 
for a given initial estimator f) = (o, b, /), 

xiv) - xiv) + PvXl 

(1-14) 

= / {a - a){b - b)sr,,if du, 

for Srj^i{z) = E^(S'j|Z = z). [From the fact that o, 
b and / are only nonparametrically restricted and 
that (1.13) gives the influence function, it can be 
shown that necessarily Srj^ib + Srf ^2 = 0 = Srj^ia + 
after which identity (1.14) follows by algebra.] This 
is quadratic in the errors d — a and b — b of the ini¬ 
tial estimators, but is special in that the squares of 
the estimation errors jo — a| and \b — b\ of the two 
initial estimators d and b do no arise, but only their 
product. This property, termed “double robustness” 
in [11, 13], makes that in first order inference it suf¬ 
fices that one of the two parameters is estimated 
well. If initial estimators of a and b attain estima¬ 
tion rates and respectively. 


then the order of the remainder term in the expan¬ 
sion is the product of these rates. This shows that 
the linear estimator (1.4) attains a rate Op{n~^P) 
if 


(1.15) 


a /3 ^ 1 

2a + d^ 213 Ad -2' 


If this condition fails, then the “bias” (1-14) is 
greater than Op{n~^P). The linear estimator (1.4) 
then does not balance bias and variance and is sub- 
optimal. 

For moderate to large dimensions d, inequality 
(1.15) is a restrictive requirement, whose validity is 
questionable for many applications. Higher order in¬ 
fluence functions allow to construct better estima¬ 
tors than the linear estimator (1.4). As shown in 
[5, 9, 10, 12, 15], there are two cases: 


• [a + l3)/2 > d/d. In this case estimation at rate 

is possible by using a higher order estima¬ 
tor (1.10) of sufficiently large order m. If the in¬ 
equality is strict, then this estimator is also semi- 
par ametrically regular and efficient, even though 
(1.15) need not be satisfied. 

• (a -|- /3)/2 < d/4. In this case the minimax rate 

of estimation is slower than n~^P. If the func¬ 
tion has a regularity 7 bigger than a certain 
cutoff [that depends on (q;,/ 3)], then the minimax 
rate is 7 T,-( 2 a+ 2 / 3 )/( 2 o+ 2 / 3 +i) attainable by 

a higher order estimator ( 1 . 10 ) with a carefully 
constructed approximate influence function 


In both cases it is necessary to estimate the marginal 
density /, or rather the function notwith¬ 

standing the fact that it does not enter the first 
order influence function (1.13). Robins et al. [9] con¬ 
struct minimax estimators under the assumption 
that this function has a minimal smoothness. A com¬ 
pletely general solution is apparently still more com¬ 
plicated. 

The details of the constructions are beyond the 
scope of the present paper. The approximations are 
based on expanding the parameters o and b on bases 
that express their regularity (e.g., suitable wavelets) 
and representing the higher order derivatives of the 
functional x on the subspaces obtained by truncat¬ 
ing these bases. The truncation point is chosen rela¬ 
tive to the functional to be estimated (and not nec¬ 
essarily the usual one used to estimate the functions 
themselves). For orders three and up, it is in addi¬ 
tion necessary to remove pairs of basis functions [re¬ 
sulting from the pair (a, 6 )] whose combined index is 
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“large”, in order to cut variance without increasing 
bias. For an introduction to constructing truncated 
second order influence functions we refer to [10]. 

2. CONCLUDING REMARKS 

One may look at the work of Robins et al. [9] and 
its sequel from two perspectives. The mathematical 
statistical point of view is the simplest: higher or¬ 
der estimating functions are a means to construct 
estimators that are theoretically minimax in com¬ 
plex semiparametric models, where the interest is 
not simply in a mean of the observations, but in 
a parameter defined through the structure of the 
model. As always in high-dimensional models, min- 
imaxity is about the bias-variance trade-off. Inspec¬ 
tion of higher order tangent spaces reveals in what 
form the bias arises, and the connected von Mises 
calculus allows to correct for it. So far no completely 
general method exists for trading this against vari¬ 
ance (other than the abstract idea to use “finite¬ 
dimensional approximations”), and, in fact, beyond 
the application to models characterized by (1.13), 
nothing much is known. 

The second perspective is practically oriented. 
The models dealt with in this paper are relevant 
in studies in epidemiology, econometrics and the so¬ 
cial sciences. The parameter of interest is defined 
through the substantial application, for instance, 
measuring a response to treatment or the conse¬ 
quence of an intervention. High dimensions arise to 
identify this parameter of interest from data. Obser¬ 
vational studies, where covariates must be included 
in the statistical analysis to control for possible con¬ 
founding, are a typical case. One has a choice to 
adopt a relatively simple statistical model for this 
complex reality, maybe even a classical paramet¬ 
ric model or a one-dimensional propensity score, 
or to let the data “speak for itself”, as much as 
possible. Without any model restriction one runs 
into the “curse of dimensionality” and no conclu¬ 
sions are possible. Semiparametric models as de¬ 
veloped in the 1980s and 1990s are between these 
extremes, but from the present perspective rela¬ 
tively close to hnite-dimensional models. In fact, 
they focus on functionals in situations where a bias- 
variance trade-off is unnecessary, as the bias is negli¬ 
gible. The main purpose of methods based on high¬ 
dimensional influence functions is to fill the huge 
gap between “classical semiparametric models” and 


the model in which nothing is assumed. In a situ¬ 
ation with fewer or less stringent a priori assump¬ 
tions on the model, statistical bias starts playing a 
role and must be traded versus variance. Estima¬ 
tors with bigger standard errors result, but bias due 
to model misspecification decreases. The choice be¬ 
tween model bias with smaller variance and larger 
estimation variance is not easy to make with current 
statistical methodology. However, larger and larger 
data bases certainly make the methodology of higher 
influence functions feasible. 

Thus, these methods are potentially useful to an¬ 
swer a wide range of questions. We close with some 
remarks about further research that needs to be 
done to make the methods fully operational. 

The improved estimators based on higher order 
influence functions combine good preliminary esti¬ 
mators for deviations of the parameter of interest 
xiVi) in some directions with a priori assumptions 
that the deviations in other “nonestimable” direc¬ 
tions are small. The latter a priori assumptions are 
always questionable. It is an open problem to de¬ 
velop estimation procedures that can “adapt” to 
“scales of a priori conditions”, for instance, by im¬ 
plicitly estimating unknown smoothness levels from 
the data. 

For practical application, estimation without error 
indications are insufficient. Although there is some 
preliminary work on confidence intervals related to 
the higher order estimators, these procedures remain 
to be explored. 

The models (1.13) considered in [9] are structured 
semiparametric models [with a partitioned parame¬ 
ter (a,6 ,c,/) and the functional of interest dehned 
naturally in terms of the partition], but typically 
nonparametric in the sense that any law on the sam¬ 
ple space is realized by some choice of the parame¬ 
ters (a,6, c,/). Genuinely semiparametric problems, 
such as partial linear regression, pose a further chal¬ 
lenge. For such models the first order influence func¬ 
tion is nonunique and, as the estimation error is big¬ 
ger than the first order variance, the efficient first 
order influence function may not play a special role, 
thus increasing the degrees of freedom in construct¬ 
ing suitable higher order influence functions. 
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